A primer to frequent itemset mining for bioinformatics

نویسندگان

  • Stefan Naulaerts
  • Pieter Meysman
  • Wout Bittremieux
  • Trung-Nghia Vu
  • Wim Vanden Berghe
  • Bart Goethals
  • Kris Laukens
چکیده

Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

A survey of itemset mining

Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally...

متن کامل

Minimally Infrequent Itemset Mining using Pattern-Growth Paradigm and Residual Trees

Itemset mining has been an active area of research due to its successful application in various data mining scenarios including finding association rules. Though most of the past work has been on finding frequent itemsets, infrequent itemset mining has demonstrated its utility in web mining, bioinformatics and other fields. In this paper, we propose a new algorithm based on the pattern-growth p...

متن کامل

Frequent Subtree Mining - An Overview

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more diffic...

متن کامل

Algorithms for Discovery of Frequent Superset, Rather than Frequent Subset

In this paper, we propose a novel mining task: mining frequent superset from the database of itemsets that is useful in bioinformatics, e-learning systems, jobshop scheduling, and so on. A frequent superset means that it contains more transactions than minimum support threshold. Intuitively, according to the Apriori algorithm, the level-wise discovering starts from 1-itemset, 2itemset, and so f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2015